fix: harden reassociation barriers for fast-math#1276
fix: harden reassociation barriers for fast-math#1276DiamonDinoia wants to merge 1 commit intoxtensor-stack:masterfrom
Conversation
91b4081 to
f9a9992
Compare
f9a9992 to
2f9a431
Compare
|
@serge-sans-paille this is also ready for review. |
| cd _build && cmake --build . | ||
| - name: Testing xsimd | ||
| run: | | ||
| cd _build && ./test/test_xsimd |
There was a problem hiding this comment.
It would be nice to split that part in a separate commit (even PR!). It's independent from the fast-math issue, right?
There was a problem hiding this comment.
yes we can, though the fast math part tough will fail without this PR.
| cd _build && cmake .. -DBUILD_TESTS=ON -DDOWNLOAD_DOCTEST=ON -DBUILD_BENCHMARK=ON -DBUILD_EXAMPLES=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_CXX_FLAGS="/arch:AVX2" -G Ninja | ||
| - name: Build | ||
| run: | | ||
| cd _build && cmake --build . |
There was a problem hiding this comment.
nit: could be cmake --build -B _build
| - name: Setup Ninja | ||
| run: | | ||
| python3 -m pip install --upgrade pip setuptools wheel | ||
| python3 -m pip install ninja |
There was a problem hiding this comment.
given the redundancy between that job and the one above, it seems better to use github strategy.
| XSIMD_INLINE void reassociation_barrier(T& x, memory_barrier_tag) noexcept; | ||
|
|
||
| template <class T, class A> | ||
| XSIMD_INLINE void reassociation_barrier(T& x, A const&) noexcept; |
There was a problem hiding this comment.
Why this dispatching based on the architecture?
There was a problem hiding this comment.
Based on the usage, I wonder if we should enforce a string literal as last argument, that explains the reason why we generate that barrier. the string would be ignored, but it enforces the documentation.
There was a problem hiding this comment.
Because arm/neon have different asm instructions or might not be supported at all. I just thought this would be a way to avoid ifdefs. Potentially reassociation_barrier can be guarded by xsimd_with_arch and we get a different function with different ifdefs. I have not investigated where to put such function (in which header) and how the code looks like. This felt xsimd idiomatic.
I am okay with a [[maybe_unused]] string literal argument to this function for documentation purposes.
There is a but in nearbyint
-fassociative-mathbreaks it as it does not define__FAST_MATH__.Also the barrier used was causing a stack spill.
I centralized a barrier function that we can use everywhere in the code and used it in the places I know it helps.
Now we can just use
reassociation_barrierto avoid compiler reordering of instructions.Let me know if you like the internal API and if you need changes to it. I find that this is the solution that minimizes ifdef boilerplate and allows to dispatch to all archs. (With c++17 this will be simpler).
Cheers,
Marco